Email Category Prediction
نویسندگان
چکیده
According to recent estimates, about 90% of consumer received emails are machine-generated. Such messages include shopping receipts, promotional campaigns, newsletters, booking confirmations, etc. Most such messages are created by populating a fixed template with a small amount of personalized information, such as name, salutation, reservation numbers, dates, etc. Web mail providers (Gmail, Hotmail, Yahoo) are leveraging the structured nature of such emails to extract salient information and use it to improve the user experience: e.g. by automatically entering reservation data into a user calendar, or by sending alerts about upcoming shipments. To facilitate these extraction tasks it is helpful to classify templates according to their category, e.g. restaurant reservations or bill reminders, since each category triggers a particular user experience. Recent research has focused on discovering the causal thread of templates, e.g. inferring that a shopping order is usually followed by a shipping confirmation, an airline booking is followed by a confirmation and then by a “ready to check in” message, etc. Gamzu et al. took this idea one step further by implementing a method to predict the template category of future emails for a given user based on previously received templates. The motivation is that predicting future emails has a wide range of potential applications, including better user experiences (e.g. warning users of items ordered but not shipped), targeted advertising (e.g. users that recently made a flight reservation may be interested in hotel reservations), and spam classification (a message that is part of a legitimate causal thread is unlikely to be spam). The gist of the Gamzu et al. approach is modeling the problem as a Markov chain, where the nodes are templates or temporal events (e.g. the first day of the month). This paper expands on their work by investigating the use of neural networks for predicting the category of emails that will arrive during a fixed-sized time window in the future. We consider two types of neural networks: multilayer perceptrons (MLP), a type of feedforward neural network; and long short-term memory (LSTM), a type of recurrent neural network. For each type of neural network, we explore the effects The work was completed at Google Research. c ©2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC-BY-NC-ND 2.0 License. WWW 2017 Companion,, April 3–7, 2017, Perth, Austraila. ACM 978-1-4503-4914-7/17/04. http://dx.doi.org/10.1145/3041021.3055166 of varying their configuration (e.g. number of layers or number of neurons) and hyper-parameters (e.g. drop-out ratio). We find that the prediction accuracy of neural networks vastly outperforms the Markov chain approach, and that LSTMs perform slightly better than MLPs. We offer some qualitative interpretation of our findings and identify some promising future directions.
منابع مشابه
CSci 5525 Machine Learning—Final Project Report Online Email Spam Prediction
In this project, we study and experiment with a category of classification algorithms that are practically effective in email spam filtering—online prediction. We devise layered algorithms that can potentially control the spam misclassification rate. We compare the results of using different feature vectors as input. Also, we present observations that some online algorithms are insensitive to t...
متن کاملShort Term and Total Life Impact analysis of email worms in computer systems
This paper develops a methodology for analyzing and predicting the impact category of malicious code, particularly email worms. The current paper develops two frameworks to classify email worms based on their detrimental impact. The first framework, the Total Life Impact (TLI) framework is a descriptive model or classifier to categorize worms in terms of their impact, after the worm has run its...
متن کاملFinal Report - Smart and Fast Email Sorting
Some people receive hundreds of emails a week and sorting all of them into different categories (e.g. Stanford, Studies, Holidays, Next Week, Internship, Friends, Graduate Activities) can be timeconsuming. Most email clients provide a sorting mechanism based on rules specified by the user : sender’s email address, key words for the subject... The aim of this project is to develop a machine lear...
متن کاملA True Expert
We suggest a test for discovering whether a potential expert is informed of the distribution of a stochastic process. In a non-Bayesian non-parametric setting, the expert is asked to make a prediction which is tested against a single realization of the stochastic process. It is shown that by asking the expert to predict a “small” set of sequences, the test will assure that any informed expert c...
متن کاملSyntactic Category Prediction for Improving Translation Quality in English-Korean Machine Translation
This paper proposes the syntactic category prediction for improving translation quality. In parsing using sentence segmentation, the segments are separately parsed and then the parsing results of each segment are combined to generate a global sentence structure. The syntactic category prediction guides the parser to identify relationships among segments and to select the correct parsing results...
متن کاملA New Approach to Quantising Space-Time: II. Quantising on a Category of Sets
In [1], a new approach was suggested for quantising space-time, or space. This involved developing a procedure for quantising a system whose configuration space—or history-theory analogue—is the set of objects in a (small) category Q. In the present paper, we show how this theory can be applied to the special case when Q is a category of sets. This includes the physically important examples whe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017